46 research outputs found

    Improving lightly supervised training for broadcast transcription

    Get PDF
    This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data. Standard lightly supervised training uses automatically derived decoding hypotheses using a biased language model. However, as the actual speech can deviate significantly from the original programme scripts that are supplied, the quality of standard lightly supervised hypotheses can be poor. To address this issue, word and segment level combination approaches are used between the lightly supervised transcripts and the original programme scripts which yield improved transcriptions. Experimental results show that systems trained using these improved transcriptions consistently outperform those trained using only the original lightly supervised decoding hypotheses. This is shown to be the case for both the maximum likelihood and minimum phone error trained systems.The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This is the accepted manuscript version. The final version is available at http://www.isca-speech.org/archive/interspeech_2013/i13_2187.html

    Automatic transcription of multi-genre media archives

    Get PDF
    This paper describes some recent results of our collaborative work on developing a speech recognition system for the automatic transcription or media archives from the British Broadcasting Corporation (BBC). The material includes a wide diversity of shows with their associated metadata. The latter are highly diverse in terms of completeness, reliability and accuracy. First, we investigate how to improve lightly supervised acoustic training, when timestamp information is inaccurate and when speech deviates significantly from the transcription, and how to perform evaluations when no reference transcripts are available. An automatic timestamp correction method as well as a word and segment level combination approaches between the lightly supervised transcripts and the original programme scripts are presented which yield improved metadata. Experimental results show that systems trained using the improved metadata consistently outperform those trained with only the original lightly supervised decoding hypotheses. Secondly, we show that the recognition task may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we describe Multi-level Adaptive Networks, a novel technique for incorporating information from out-of domain posterior features using deep neural network. We show that it provides a substantial reduction in WER over other systems including a PLP-based baseline, in-domain tandem features, and the best out-of-domain tandem features.This research was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This paper was presented at the First Workshop on Speech, Language and Audio in Multimedia, August 22-23, 2013; Marseille. It was published in CEUR Workshop Proceedings at http://ceur-ws.org/Vol-1012/

    The experience of admission to psychiatric hospital among Chinese adult patients in Hong Kong

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The paper reports on a study to evaluate the psychometric properties and cultural appropriateness of the Chinese translation of the Admission Experience Survey (AES).</p> <p>Methods</p> <p>The AES was translated into Chinese and back-translated. Content validity was established by focus groups and expert panel review. The Chinese version of the Admission Experience Survey (C-AES) was administered to 135 consecutively recruited adult psychiatric patients in the Castle Peak Hospital (Hong Kong SAR, China) within 48 hours of admission. Construct validity was assessed by comparing the scores from patients admitted voluntarily versus patients committed involuntarily, and those received physical or chemical restraint versus those who did not. The relationship between admission experience and psychopathology was examined by correlating C-AES scores with the Brief Psychiatric Rating Scale (BPRS) scores.</p> <p>Results</p> <p>Spearman's item-to-total correlations of the C-AES ranged from 0.50 to 0.74. Three factors from the C-AES were extracted using factor analysis. Item 12 was omitted because of poor internal consistency and factor loading. The factor structure of the Process Exclusion Scale (C-PES) corresponded to the English version, while some discrepancies were noted in the Perceived Coercion Scale (C-PCS) and the Negative Pressure Scale (C-NPS). All subscales had good internal consistencies. Scores were significantly higher for patients either committed involuntarily or subjected to chemical or physical restrain, independent on severity of psychotic symptoms.</p> <p>Conclusion</p> <p>The Chinese AES is a psychometrically sound instrument assessing the three different aspects of the experience of admission, namely "negative pressure, "process exclusion" and "perceived coercion". The potential of C-AES in exploring subjective experience of psychiatric admission and effects on treatment adherence should be further explored.</p

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

    “A Hideous Torture on Himself”: Madness and Self-Mutilation in Victorian Literature

    Get PDF
    This paper suggests that late nineteenth-century definitions of self-mutilation, a new category of psychiatric symptomatology, were heavily influenced by the use of selfinjury as a rhetorical device in the novel, for the literary text held a high status in Victorian psychology. In exploring Dimmesdale’s “self-mutilation” in The Scarlet Letter in conjunction with psychiatric case histories, the paper indicates a number of common techniques and themes in literary and psychiatric texts. As well as illuminating key elements of nineteenth-century conceptions of the self, and the relation of mind and body through ideas of madness, this exploration also serves to highlight the social commentary implicit in many Victorian medical texts. Late nineteenth-century England, like mid-century New England, required the individual to help himself and, simultaneously, others; personal charity and individual philanthropy were encouraged, while state intervention was often presented as dubious. In both novel and psychiatric text, self-mutilation is thus presented as the ultimate act of selfish preoccupation, particularly in cases on the “borderlands” of insanity

    Prediction of diabetic retinopathy: role of oxidative stress and relevance of apoptotic biomarkers

    Full text link

    Using sub-word-level information for confidence estimation with conditional random field models

    No full text
    The task of word-level confidence estimation (CE) for automatic speech recognition (ASR) systems stands to benefit from the combination of suitably defined input features from multiple information sources. However, the information sources of interest may not necessarily operate at the same level of granularity as the underlying ASR system. The research described here builds on previous work on confidence estimation for ASR systems using features extracted from word-level recognition lattices, by incorporating information at the sub-word level. Furthermore, the use of Conditional Random Fields (CRFs) with hidden states is investigated as a technique to combine information for word-level CE. Performance improvements are shown using the sub-word-level information in linear-chain CRFs with appropriately engineered feature functions, as well as when applying the hidden-state CRF model at the word level
    corecore